616 research outputs found
Hyphenation : from transformer models and word embeddings to a new linguistic rule-set
Modern language models, especially those based on deep neural networks, frequently use bottom-up vocabulary generation techniques like Byte Pair Encoding (BPE) to create word pieces enabling them to model any sequence of text, even with a fixed-size vocabulary significantly smaller than the full training vocabulary.
The resulting language models often prove extremely capable. Yet, when included into traditional Automatic Speech Recognition (ASR) pipelines, these languages models can sometimes perform quite unsatisfyingly for rare or unseen text, because the resulting word pieces often don’t map cleanly to phoneme sequences (consider for instance Multilingual BERT’s unfortunate breaking of Sonnenlicht into Sonne+nl+icht). This impairs the ability for the acoustic model to generate the required token sequences, preventing good options from being considered in the first place.
While approaches like Morfessor attempt to solve this problem using more refined algorithms, these approaches only make use of the written form of a word as an input, splitting words into parts disregarding the word’s actual meaning.
Meanwhile, word embeddings for languages like Dutch have become extremely common and high-quality; in this project, the question of whether this knowledge about a word usage in context could be leveraged to yield better hyphenation quality will be investigated.
For this purpose, the following approach is evaluated: A baseline Transformer model is tasked to generate hyphenation candidates for a given word based on its written form, and those candidates are subsequently reranked based on the embedding of the hyphenated word. The obtained results will be compared with the results yielded by Morfessor based on the same dataset.
Finally, a new set of linguistic rules to perform Dutch hyphenation (suitable for use with Liang’s hyphenation algorithm from TEX82) will be presented. The resulting output of these rules will be compared to currently available rule-sets
La connaissance de soi chez Épictète et Marc-Aurèle
Cet article se veut une exploration du thème de la connaissance de soi chez les philosophes stoïciens Épictète et Marc-Aurèle. À la lumière de la définition socratique du gnothi seauton (connais-toi toi-même), nous proposons d’examiner la « philosophie du soi » qu’Épictète et Marc-Aurèle ont su développer. Plus spécifiquement, nous souhaitons expliciter la célèbre distinction qu’effectue Épictète dans son Manuel (et qui sera reprise par Marc-Aurèle dans ses Pensées pour moi-même) entre ce qui dépend de nous (jugements, tendances, désirs, aversions, etc.) et ce qui ne dépend pas de nous (le corps, la célébrité, la richesse, le pouvoir). Dans la perspective stoïcienne qui est celle d’Épictète et de Marc-Aurèle, nous chercherons à démontrer que « se connaître soi-même » signifie être capable d’identifier ce qui dépend de notre juridiction, et qui dès lors n’est pas soumis au Destin
Automatic Glossary of Clinical Terminology: a Large-Scale Dictionary of Biomedical Definitions Generated from Ontological Knowledge
Background: More than 400,000 biomedical concepts and some of their
relationships are contained in SnomedCT, a comprehensive biomedical ontology.
However, their concept names are not always readily interpretable by
non-experts, or patients looking at their own electronic health records (EHR).
Clear definitions or descriptions in understandable language are often not
available. Therefore, generating human-readable definitions for biomedical
concepts might help make the information they encode more accessible and
understandable to a wider public.
Objective: In this article, we introduce the Automatic Glossary of Clinical
Terminology (AGCT), a large-scale biomedical dictionary of clinical concepts
generated using high-quality information extracted from the biomedical
knowledge contained in SnomedCT.
Methods: We generate a novel definition for every SnomedCT concept, after
prompting the OpenAI Turbo model, a variant of GPT 3.5, using a high-quality
verbalization of the SnomedCT relationships of the to-be-defined concept. A
significant subset of the generated definitions was subsequently judged by NLP
researchers with biomedical expertise on 5-point scales along the following
three axes: factuality, insight, and fluency.
Results: AGCT contains 422,070 computer-generated definitions for SnomedCT
concepts, covering various domains such as diseases, procedures, drugs, and
anatomy. The average length of the definitions is 49 words. The definitions
were assigned average scores of over 4.5 out of 5 on all three axes, indicating
a majority of factual, insightful, and fluent definitions.
Conclusion: AGCT is a novel and valuable resource for biomedical tasks that
require human-readable definitions for SnomedCT concepts. It can also serve as
a base for developing robust biomedical retrieval models or other applications
that leverage natural language understanding of biomedical knowledge.Comment: Accepted at the BioNLP 2023 worksho
Assessment of field rolling resistance of manual wheelchairs
This article proposes a simple and convenient method for assessing the subject-specific rolling resistance acting on a manual wheelchair, which could be used during the provision of clinical service. This method, based on a simple mathematical equation, is sensitive to both the total mass and its fore-aft distribution, which changes with the subject, wheelchair properties, and adjustments. The rolling resistance properties of three types of front casters and four types of rear wheels were determined for two indoor surfaces commonly encountered by wheelchair users (a hard smooth surface and carpet) from measurements of a three-dimensional accelerometer during field deceleration tests performed with artificial load. The average results provided by these experiments were then used as input data to assess the rolling resistance from the mathematical equation with an acceptable accuracy on hard smooth and carpet surfaces (standard errors of the estimates were 4.4 and 3.9 N, respectively). Thus, this method can be confidently used by clinicians to help users make trade-offs between front and rear wheel types and sizes when choosing and adjusting their manual wheelchair.This material was based on work supported by the SACR-FRM project, French National Research Agency (ANR-06-TecSan-020) and the Centre d’Etudeset de Recherche sur l’Appareillage des Handicapés (loaned all MWCs required to fulfill this work
La réaction de Gorgias au Poème de Parménide : élaboration d'une rhétorique comme fuite de l'ontologie
L’objectif principal de ce mémoire de maîtrise sera d’examiner en détails les arguments que Gorgias avance dans le Traité sur le non-être pour supporter sa thèse de l’impossibilité de la connaissance. Ces arguments sont au nombre de trois : a) rien n’est; b) Si quelque chose est, c’est inconnaissable; c) Si c’est connaissable, c’est indémontrable aux autres. En plus de s’attaquer à la thèse parménidienne de la correspondance entre le « penser » (noein) et « l’être » (einai), ces arguments viennent justifier l’art rhétorique. En effet, sans la connaissance qui nous permettrait de départager le vrai du faux, l’être humain n’a plus rien d’autre que ses intérêts personnels et les moyens rhétoriques de les faire triompher. Dans un tel contexte, la rhétorique devient, à proprement parler, la seule science véritablement légitime.The main objective of this master’s thesis is to closely examine the arguments that Gorgias develops in his Discourse on the not-being, in which he defends the impossibility of knowledge. There are three arguments in this discourse: a) nothing exists; b) even if something exists, nothing can be known about it; c) even if something can be known about it, knowledge about it can't be communicated to others. These arguments do not only attack the parmenidean’s thesis of correspondence (between mind and being); they also present themselves as a justification of the art of rhetoric. As a matter of fact, without knowledge, we are unable to establish a strong difference between truth and falsehood, and therefore we are left alone with our desires and interests. It is in such a context that rhetoric becomes, literally, the only legitimate art (or technique)
Neural Posterior Estimation with Differentiable Simulators
Simulation-Based Inference (SBI) is a promising Bayesian inference framework
that alleviates the need for analytic likelihoods to estimate posterior
distributions. Recent advances using neural density estimators in SBI
algorithms have demonstrated the ability to achieve high-fidelity posteriors,
at the expense of a large number of simulations ; which makes their application
potentially very time-consuming when using complex physical simulations. In
this work we focus on boosting the sample-efficiency of posterior density
estimation using the gradients of the simulator. We present a new method to
perform Neural Posterior Estimation (NPE) with a differentiable simulator. We
demonstrate how gradient information helps constrain the shape of the posterior
and improves sample-efficiency.Comment: Accepted at the ICML 2022 Workshop on Machine Learning for
Astrophysic
- …